Pattern recognition method and apparatus

ABSTRACT

A pattern recognition method and apparatus decrease the amount of computation for pattern recognition and adapts flexibly to an increase and a change in learning samples. Learning is made beforehand on base vectors in a subspace of each category and a kernel function. Pattern data to be recognized is inputted, and projection of an input pattern to a nonlinear subspace of each category is decided. Based on the decided projection, a Euclidean distance or an evaluation value related to each category is calculated from the property of the kernel function, and is compared with a threshold value. If a category for which the evaluation value is below the threshold value exists, a category for which the evaluation value is the smallest is outputted as recognition results. If there is no category for which the evaluation value is below the threshold value, a teaching signal is inputted for additional learning.

BACKGROUND OF THE INVENTION

[0001] 1. Field of the Invention

[0002] The present invention relates to a pattern recognition techniquethat uses a computer to judge to which of plural categories an inputimage and other data belong.

[0003] 2. Description of the Related Art

[0004] Conventionally, pattern recognition methods by use of a computerhave been proposed. For example, the following research is conducted.The image of a human face captured by an image sensor is analyzed tojudge which portions of the image correspond to an eye, a nose, andothers, categorize them, and determine to whose face the imagecorresponds by comparing the image with categorized plural images storedin advance.

[0005] A subspace method is well known as a pattern recognition method.According to the subspace method, a subspace is defined for each ofplural categories, and by determining with which subspace an unknownpattern has the highest degree of relation, a category to which theunknown pattern belongs is determined. In the subspace method, wherethere are many categories, recognition accuracy becomes low, and it alsobecomes low for nonlinear pattern distributions.

[0006] Another well-known recognition method is the support vectormachine (SVM) method. In the SVM method, by introducing a kernelfunction, low-dimension patterns are turned into high-dimensionpatterns, and nonlinear pattern distributions can also be recognized.However, the number of categories to which the method is applied is nomore than two and an enormous amount of computation is required.

[0007] Recently, the kernel nonlinear subspace method has been proposedwhich combines the advantages of the subspace method and the advantagesof the SVM method (Japanese Published Unexamined Patent Application No.2000-90274). In the kernel nonlinear subspace method, patterns to berecognized are mapped to a high-dimension nonlinear space by usingnonlinear conversion definable by a kernel function to createhigh-dimension patterns, as in the SVM method, and pattern recognitionis performed by applying the subspace method on the high-dimensionnonlinear space.

[0008] The kernel nonlinear subspace method, to define a subspace of acategory i, creates base vectors by linear combination of mappings ofall learning samples to a nonlinear space. Herein, as a method ofcalculating an evaluation value for judging whether an unknown inputpattern belongs to a category, a method is disclosed which utilizesprojection components produced when a pattern to be recognized isprojected to subspaces on a high-dimension liner space that correspondto categories. Since the subspaces are defined by linearly combiningbase vectors produced using learning samples, which are low-dimensionvectors, the projection components to be obtained to recognize inputpatterns can be calculated simply by calculating the low-dimensionvectors by use of a kernel function.

[0009] However, since the computation includes kernel operations betweenthe pattern to be recognized and all learning samples, and inner productoperations with the number of all learning sample as a dimension count,when the number of learning samples increases, the amount of computationwould increase in proportion to the number of learning samples.Moreover, since all learning samples must be saved for kernelcomputation, there has been a problem in that a large storage area isrequired.

[0010] Since a learning process is performed by singular valuedecomposition of a kernel matrix with the results of kernel operationsbetween learning samples as components, there is a problem in that, whena learning sample is newly added, learning must be performed again usingexisting learning samples to recalculate the weight of linearcombination of base vectors constituting a subspace.

[0011] A kernel function must have been selected beforehand forrecognition targets and cannot be adaptively changed depending onlearning, posing the problem that recognition accuracy does notincrease.

SUMMARY OF THE INVENTION

[0012] The present invention intends to provide a pattern technique thatdecreases the amount of computation for pattern recognition or canflexibly adapt to an increase and a change in learning samples.

[0013] According to one aspect of the present invention, a patternrecognition method includes: an evaluation value calculating step forusing a set of vectors obtained by mapping a set of vectors conformed toat least one learning sample in an input space respectively to anonlinear space defined by a kernel function as a set of base vectorsconstituting a subspace in the nonlinear space, defined for each ofcategories into which a pattern is classified, to calculate anevaluation value representative of a relation between the pluralsubspaces represented by linear combination of corresponding sets of thebase vectors, and mapping of an unknown input pattern to the nonlinearspace; and a category recognition step for recognizing a category towhich the unknown input pattern belongs, based on the evaluation value.

[0014] In this configuration, vectors obtained by mapping vectors(hereinafter referred to as preimages or preimage vectors) conformed tolearning samples in an input space to nonlinear spaces defined in kernelspaces are used as base vectors constituting the nonlinear spaces.Therefore, when calculating an evaluation value showing a relationbetween the unknown pattern and subspaces can be calculated using thepreimage vectors, preventing the amount of computation from increasingbecause of an increase in the number of learning samples.

[0015] The term “conforming” also implies the meaning of “approximating”or “representing”.

[0016] According to another aspect of the present invention, a method oflearning base vectors constituting subspaces used for patternrecognition includes: a projection decision step that, for each mappingof a learning pattern to a nonlinear space defined by a kernel function,decides projection to a subspace that corresponds to a category intowhich the learning pattern is classified, and is represented by linearcombination of a set of base vectors respectively created by a mappingof a set of vectors in an input space to the nonlinear space; and avector updating step that updates the vectors in the input space inwhich the base vectors are created, to increase a relation between themapping of the learning pattern obtained by the decided projection tothe nonlinear space and the subspace corresponding to the category intowhich the learning pattern is classified.

[0017] In this configuration, the base vectors constituting the subspacecan be updated according to a new learning pattern without having tohold previous learning patterns.

[0018] According to another aspect of the present invention, a method ofdeforming a kernel function includes the steps of: setting a kernelfunction for defining the mapping of a pattern in an input space to anonlinear space which includes subspaces each defined for each ofcategories to which the pattern is classified; calculating a relationbetween the mapping of a learning pattern in the input space to thenonlinear space and the subspaces; and deforming the kernel functionaccording to the result of the calculation of the relation.

[0019] In this configuration, variations in projection components tobase vectors are calculated using a learning pattern, and a kernelfunction can be adaptively changed based on the calculation result.

[0020] The present invention can be implemented as a method, apparatus,or system, or can be implemented as a storage medium storing a programin at least a part thereof.

[0021] The above-described aspects and other aspects of the presentinvention will be described in claims and described below in detailusing the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

[0022] Preferred embodiments of the present invention will be describedin detail based on the followings, wherein:

[0023]FIG. 1 is a flowchart of recognition processing in an embodimentof the present invention;

[0024]FIG. 2 is a flowchart of learning in an embodiment of the presentinvention;

[0025]FIG. 3 is an example of a learning sample in an embodiment of thepresent invention;

[0026]FIG. 4 is an example of recognition results in an embodiment ofthe present invention;

[0027]FIG. 5 illustrates comparison of computation time between anembodiment of the present invention and a conventional example; and

[0028]FIG. 6 is a block diagram showing a variation of an embodiment ofthe present invention.

DESCRIPTION OF THE PREFERRED EMBODIMENTS Pattern Recognition

[0029] The principle of pattern recognition by the present inventionwill be described using a specific example. An arbitrary d-dimensionalinput pattern x and nonlinear conversion Φ that projects the inputpattern x to a d_(Φ)-dimensional high-dimension characteristic space aredefined, and pattern recognition is performed in the d_(Φ)-dimensionalhigh-dimension characteristic space.

[0030] The mapping of the d-dimensional input pattern x by the nonlinearconversion Φ that is in relation with a kernel function $\begin{matrix}{{K\left( {x,y} \right)} = {\sum\limits_{i = 1}^{d_{\Phi}}{\lambda_{1}{\varphi_{i}(x)}{\varphi_{i}(y)}}}} & (1)\end{matrix}$

[0031] with

[0032] ▭₁(X), . . . , ø_(d) _(Φ) (X)

[0033] as eigenfunctions is represented as

x=(x ₁ , . . . , x _(d))→Φ(x)=({square root}{square root over(λ₁)}ø₁(x), . . . , {square root}{square root over (λ_(d) _(Φ))}ø_(dΦ)(x))  (2)

[0034] using the kernel eigenfunctions, where λ_(i) is an eigenvalue foran eigenfunction φ_(i)(x).

[0035] Next, for a subspace Ω of a high-dimension characteristic spacerepresented by linear combination of nonlinear conversion images Φ(x₁),. . . , Φ(x_(n)) of n vectors x₁, . . . , x_(n) of d-dimension inputspace, a subspace of the high-dimension characteristic space is providedfor a category to which the pattern belongs. The nonlinear conversionimages Φ(x₁), . . . , Φ(x_(n)) are referred to as base vectors of thesubspace Ω, and n vectors x₁, . . . , x_(n) in an input space arereferred to as preimages of the base vectors.

[0036] The preimages x₁, . . . ,X_(n) are vectors that have been learnedby a learning method described later, and have the characteristics ofplural learning patterns, and each of the vectors is an approximation ofplural learning patterns or is representative of plural learningpatterns. The base vectors of the subspace Ω do not necessarily need tohave an orthogonal relation with each other. The number of preimages,that is, the number n of base vectors of the subspace Ω is a presetparameter that influences recognition capability.

[0037] Projection of a nonlinear conversion image Φ(x) of the inputpattern x to the subspace Φ is represented by a vector,

Φ(x)

[0038] which is the vector within the subspace Q with minimal Euclideandistance from the nonlinear conversion image Φ(x).

Φ(x)

[0039] is decided by finding a coefficient a_(i) which minimizes$\begin{matrix}{E = {{{{\Phi (x)} - {\overset{\sim}{\Phi}(x)}}}^{2} = {{{\Phi (x)} - {\sum\limits_{i = 1}^{n}{a_{i}{\Phi \left( x_{i} \right)}}}}}^{2}}} & (3)\end{matrix}$

[0040] The expression (3) is represented as $\begin{matrix}{E = {{K\left( {x,x} \right)} - {2{\sum\limits_{i = 1}^{n}{a_{i}{K\left( {x,x_{i}} \right)}}}} + {\sum\limits_{i = 1}^{n}{\sum\limits_{j^{\sim}1}^{n}{a_{i}a_{j}{K\left( {x_{i},x_{j}} \right)}}}}}} & (4)\end{matrix}$

[0041] using the property (1) expression of the kernel function. a_(i)is obtained by serially changing it by a value obtained by the followingexpression from its initial value by the steepest descent method.$\begin{matrix}{{\Delta \quad a_{i}} = {{\xi \frac{\partial E}{\partial a_{i}}} = {2\xi \left\{ {{K\left( {x,x_{i}} \right)} - {\sum\limits_{j = 1}^{n}{a_{j}{K\left( {x_{i},x_{j}} \right)}}}} \right\}}}} & (5)\end{matrix}$

[0042] ξ is a small positive constant.

[0043] E obtained by assigning a_(i) finally obtained to the expression(4) denotes the distance between the nonlinear conversion image Φ(x) ofthe input pattern x and the subspace Ω. Specifically, E for a subspacecorresponding to each category is computed, and it is judged that theinput pattern belongs to the category where E is minimum.

[0044] The amount of computation depends on the number n of (preimage)vectors in an input space characterizing subspaces and the number d ofdimensions, and will not increase in proportion to the number oflearning samples. Since K(x_(i),x_(j)) in the expressions (4) and (5) isunchanged after learning is completed, it has to be computed only once.Learning samples do not need to be saved, contributing to reduction instorage capacity.

[0045] The amount of computation required for pattern recognition isvery small because computation in a high dimensional non-linear space isnot required. Since the steepest descent method also permits operationson base vectors in a subspace having no orthogonal relation with eachother, it is particularly effective to changes of vectors (preimages)for creating base vectors constituting a subspace, as in the presentinvention.

Preimage Learning

[0046] Learning of preimage x_(i) of a base vector in a subspace isperformed as follows. Where, e.g., the steepest descent method is usedfor learning samples, a₁, . . . , a_(n) are computed using theexpression (5). After the obtained a₁, . . . , a_(n) are assigned to theexpression (3), a change of base vector Φ(x_(i)) is found as follows bythe steepest descent method. $\begin{matrix}{{\Delta \quad {\Phi \left( x_{i} \right)}} = {{{- \eta}\frac{\partial E}{\partial{\Phi \left( x_{i} \right)}}} = {2\eta \quad a_{i}\left\{ {{\Phi (x)} - {\sum\limits_{j = 1}^{n}{a_{j}{\Phi \left( x_{j} \right)}}}} \right\}}}} & (6)\end{matrix}$

[0047] A change of preimage x_(i) corresponding to the change of Φ(x₁)is represented by $\begin{matrix}{{{\Delta \quad x_{i}} = {\left( \frac{\partial{\Phi \left( x_{i} \right)}}{\partial x_{i}} \right)^{- 1}\Delta \quad {\Phi \left( x_{i} \right)}}}{where}\frac{\partial{\varphi \left( x_{i} \right)}}{\partial x_{i}}} & (7)\end{matrix}$

[0048] is a matrix with d_(Φ)) rows and d columns, and$\left( \frac{\partial{\varphi \left( x_{i} \right)}}{\partial x_{i}} \right)^{- 1}$

[0049] is an inverse matrix thereof. Generally, since the nonlinearmapping Φ is an n−to−1 mapping, an inverse matrix$\left( \frac{\partial{\varphi \left( x_{i} \right)}}{\partial x_{i}} \right)^{- 1}$

[0050] does not exist, but if the inverse matrix is approximated by apseudo inverse matrix, $\begin{matrix}{{\left( \frac{\partial{\varphi \left( x_{i} \right)}}{\partial x_{i}} \right)^{- 1} \cong {\left( {\frac{\partial{\varphi \left( x_{i} \right)}^{T}}{\partial x_{i}}\frac{\partial{\varphi \left( x_{i} \right)}}{\partial x_{i}}} \right)^{- 1}\left( \frac{\partial{\varphi \left( x_{i} \right)}}{\partial x_{i}} \right)^{T}}} = {\left( {g_{ab}\left( x_{i} \right)} \right)^{- 1}\left( \frac{\partial{\varphi \left( x_{i} \right)}}{\partial x_{i}} \right)^{T}}} & (8)\end{matrix}$

[0051] is obtained, where a matrix g_(ab)(x_(i)) with d rows and dcolumns is a metric tensor and represented as $\begin{matrix}{{{g_{ab}\left( x_{i} \right)} = {\frac{\partial\quad}{\partial x_{i}^{a}}\frac{\partial\quad}{\partial x_{i}^{\prime \quad b}}{K\left( {x_{i},x_{i}^{\prime}} \right)}}}}_{x,{- x_{i}^{\prime}}} & (9)\end{matrix}$

[0052] using a kernel function. If the expressions (6) and (8) areassigned to the expression (7), the following expression is obtained.$\begin{matrix}{{\Delta \quad x_{i}} = {2\eta \quad {a_{i}\left( {g_{ab}\left( x_{i} \right)} \right)}^{- 1}\left\{ {{\frac{\partial}{\partial x_{i}}{K\left( {x,x_{i}} \right)}} - {\sum\limits_{j = 1}^{n}{a_{J}\frac{\partial}{\partial x_{i}}{K\left( {x_{i},x_{j}} \right)}}}} \right\}}} & (10)\end{matrix}$

[0053] Learning is performed by serially updating preimage x_(i) of basevector using the expression (10).

[0054] Since computation for the above learning is performed by updatingpreimages of base vectors, the amount of the computation is very smallin comparison with conventional cases where all learning samples arerequired. Even in the case where learning samples are added after thelearning is completed, since the learning may be performed for the addedlearning samples, additional learning is easy. The amount of computationfor the above learning is very small because operations in a highdimensional non-linear space are not required.

Updating Kernel Function

[0055] A description will be made of a method of learning a kernelfunction. Before learning is started, well-known functions such as Gaussfunction kernels and polynominal kernels are set. During learning, akernel function is deformed by a conformal mapping

K(x, y)=C(x)C(y)K(x, y)  (11)

[0056] C(x) is changed so that variations of coefficient a_(i) forlearning samples become uniform for any coefficient a_(i). Specifically,where variations of coefficient a_(i) are larger than a preset value,the value of C(x) regarding the vicinity of preimage x_(i) of a basevector of a subspace corresponding to the coefficient a_(i) isincreased. This enlarges a space in the vicinity of x_(i) in a nonlinearcharacteristic space by $\begin{matrix}{{{\overset{\sim}{g}}_{ab}(x)} = {{\frac{\partial{c(x)}}{\partial x^{a}}\frac{\partial{c(x)}}{\partial x^{b}}} + {{c(x)}^{2}{g_{ab}(x)}}}} & (12)\end{matrix}$

[0057] Accordingly, the number of learning samples having coefficienta_(i) of a large value decreases relatively, and variations for alearning sample of coefficient a_(i) decrease. Conversely, wherevariations of coefficient a_(i) are smaller than a preset value, thevalue of C(x) regarding the vicinity of preimage x_(i) of a base vectorof a subspace corresponding to the coefficient a_(i) is decreased.

[0058] C(x) regarding preimage vicinity is increased or decreased by,e.g., an expression (14).

[0059] By a variation of a kernel function as described above, a kernelfunction adapted to learning samples is obtained, so that base vectorsΦ(x₁) in a subspace represent an approximately equal number of learningsamples. This increases recognition capability. As implied from theexpression (10), smaller g_(ab)(x) makes variations of coefficient a_(i)smaller because changes of input space vector x_(i) characterizingsubspaces by learning are larger, that is, x_(i) less responsive tolearning samples speeds up learning. On the other hand, x_(i) moreresponsive to learning samples makes changes of input space vector x_(i)smaller and brings about a stable condition. This yields the effect ofshort-time learning period and excellent convergence.

[0060] Although the above method produces only the values of C(x_(i)) inthe vicinity of preimages, the values of C(x) in other locations can beobtained through extrapolation from these values by the followingexpression $\begin{matrix}{{C(x)} = \frac{\sum\limits_{i = 1}^{n}{a_{i}{C\left( x_{i} \right)}}}{\sum\limits_{i = 1}^{n}a_{i}}} & (13)\end{matrix}$

Recognition Processing Procedure

[0061] Next, a detailed procedure for pattern recognition processingaccording to the present invention will be described using a flowchartof FIG. 1. Assume that base vectors Φ(x_(i)) of a subspace of eachcategory and a kernel function have been learned according to aprocedure described later. In S101, pattern data to be recognized isinputted. The data is multidimensional data inputted from varioussensors, such as image or sound data, or multidimensional patternobtained by converting the data. As learning patterns and unknownpatterns, files stored on computer storage media and files obtained fromnetworks can be used, and an interface with equipment for the files canbe used as a pattern input device. In S102, projection a_(i) to anonlinear subspace of each category of an input pattern is calculated.To do this, a_(i) is set to an initial value by, e.g., $\begin{matrix}{a_{i} = \frac{K\left( {x,x_{i}} \right)}{K\left( {x_{i},x_{i}} \right)}} & (14)\end{matrix}$

[0062] and serially updated by the expression (5). Each time a_(i) isupdated, an evaluation value E is calculated by the expression (4), anda_(i) is updated until changes of E become below 1% or a predeterminednumber of times is reached. In S103, an evaluation value E calculatedfor each category is compared with a threshold value, and if, in S104, acategory for which the evaluation value E is below the threshold valueexists, control proceeds to S105. In S105, a category for which theevaluation value E is the smallest is outputted as recognition results.If there is no category for which the evaluation value E is below thethreshold value, recognition becomes unsuccessful and control proceedsto S106. In S106, a teaching signal indicative of a category of theinput pattern is inputted. The teaching may be manually given or createdanalogically from other information. After the teaching signal isinputted, control proceeds to S204 in FIG. 2 for additional learningdescribed later.

[0063] Using a flowchart of FIG. 2, a description will be made of aprocedure for learning base vectors Φ(x_(i)) of a nonlinear subspace ofeach category and a kernel function in the present invention. The numberof base vectors Φ(x_(i)) of a nonlinear subspace is specifiedbeforehand. Initial value of base vectors Φ(x_(i)) is given by a randomnumber. The kernel function is set beforehand to a known kernel such asa Gaussian kernel or polynominal kernel. In S201, a learning sample isinputted. The learning sample is the similar pattern data used in therecognition processing. In S202, the category to which the learningsample belongs is inputted as a teaching signal to select a nonlinearsubspace of the category to be learned. In S203, projection a_(i) of thelearning pattern to the selected nonlinear subspace is calculatedaccording to the procedure as in the recognition processing. In S204,the base vectors of the nonlinear subspace are updated according to theexpression (10). In S205, the kernel function is deformed based on theexpression (11). C(x) of the expression (11) is updated based on$\begin{matrix}{{\overset{\sim}{C}(x)} = {\left( \frac{\sigma_{a_{i}}}{\sigma_{conv}} \right)^{a}{C(x)}}} & (15)\end{matrix}$

[0064] so that variation σ_(ai) of projection a_(i) to all learningsamples converges to a given value σ_(conv). αis a positive constant.σ_(conv) may be set beforehand, or autonomously changed to an averagevalue of σ_(ai) of each base. Since the above method gives C(x) only topreimages of bases of nonlinear subspace, C(x) for a given input x isobtained through extrapolation from the value of C(x) at preimage. Theabove learning continues until an evaluation value E for the learningsample becomes significantly small.

[0065] The additional learning may start from S204 because a teachingsignal has been obtained the recognition processing in FIG. 1 andprojection a_(i) to a nonlinear subspace has been calculated.

[0066]FIG. 4 is a drawing showing a result of recognizing an imagepattern by the present invention. An input pattern is an image of 27 by27 pixels belonging to one of three categories shown in FIG. 3. FIG. 4shows an evaluation value E for a nonlinear subspace in which learningwas performed with learning samples of category 1. An evaluation value Efor input of the category 1 is about one hundredth of evaluation valuesE for input of other categories, indicating that the pattern can berecognized. FIG. 5 shows computation time required for recognitionprocessing by the present invention along with comparison with theconventional kernel nonlinear subspace method. The horizontal axisindicates the number of patterns used for learning and the vertical axisindicates computation time required for recognition processing. It willbe understood that the present pattern recognition method enablesrecognition with computation time of about one hundredth in comparisonwith the conventional method.

[0067]FIG. 6 shows an example of configuring hardware so that parallelprocessing can be performed on a category basis to increase computationspeed. A pattern input unit 301 inputs a pattern signal. A teachingsignal input unit 302 inputs a teaching signal. For each category,nonlinear space learning units 303-1 to 303-n, and nonlinear projectioncomputation units 304-1 to 304-n are provided. The nonlinear projectioncomputation units 304-1 to 304-n are respectively provided with subspaceprojection computation units 305-1 to 305-n to which a pattern signalinputted from the pattern input unit 301 is inputted, and evaluationvalue computation unit 306-1 to 306-n to which projection computationresults are inputted and which determine a relation with the category,and projection variation computation units 307-1 to 307-n to whichprojection computation results are inputted. The nonlinear spacelearning units 303-1 to 303-n are respectively provided with base vectorupdating units 308-1 to 308-n that update base vectors, based on theprojection of the inputted learning pattern and a teaching signal, andkernel function updating units 309-1 to 309-n that update a kernelfunction according to the degree of projection variations. The basevector updating units 308-1 to 308-n have an internal storing part forstoring, e.g., preimage data. Of course, the preimage data may also bestored in other storing units. The kernel function updating units 309-1to 309-n also have an internal storing unit for storing data on a kernelfunction. Of course, the data on a kernel function may also be stored inother storing units. Evaluation results judged for each category areinputted to an evaluation result comparing unit 310, which determines towhich category the inputted pattern belongs, and outputs the result.

[0068] It will be easily understood that this configuration also permitsthe same processing described using FIGS. 1 and 2 to be performed.

[0069] As has been described above, the present invention is provided asa computer program for executing the above-described algorithm, and alsomay be configured on dedicated hardware as described above.

[0070] As has been described above, according to a specificconfiguration of the present invention, not all learning samples need tobe used to recognize complicated patterns, recognition is enabled with asmall amount of computation, and a memory for storing learning samplesis not required. Also, additional learning in which learning samples arenewly added is easy. Moreover, a kernel function can be adaptivelychanged, offering the effects of increased recognition accuracy andspeeding up learning.

[0071] As has been described above, the present invention can decreasethe amount of computation for pattern recognition and adapt flexibly toan increase and change in learning samples.

[0072] The entire disclosure of Japanese Patent Application No.2000-390459 filed on Dec. 22, 2001 including specification, claims,drawings and abstract is incorporated herein by reference in itsentirety.

What is claimed is:
 1. A pattern recognition method, comprising:calculating an evaluation value by using a set of vectors obtained bymapping a set of vectors conformed to at least one learning sample in aninput space respectively to a nonlinear space defined by a kernelfunction as a set of base vectors constituting each of plural subspacesin the nonlinear space, defined for each of categories into which apattern is classified, the evaluation value representing a relationbetween the plural subspaces represented by a linear combination ofcorresponding sets of the base vectors and mapping of an unknown inputpattern to the nonlinear space; and recognizing a category to which theunknown input pattern belongs, based on the evaluation value.
 2. Thepattern recognition method according to claim 1, wherein the evaluationvalue calculation includes the steps of: obtaining projection of themapping of the unknown input pattern to the nonlinear space to asubspace of each of the categories by making serial computations by asteepest descent method on a distance between the mapping of the inputpattern to the nonlinear space and the subspaces; and calculating thedistance from the obtained projection, and the category recognitionincludes the step of: determining that the input pattern belongs to acategory where the distance from the subspace is minimum.
 3. A method oflearning base vectors constituting subspaces used for patternrecognition, comprising: deciding, for each mapping of a learningpattern to a nonlinear space defined by a kernel function, projection toa subspace that corresponds to a category into which the learningpattern is classified, the subspace being represented by a linearcombination of a set of base vectors created by a mapping of a set ofvectors, respectively, in an input space to the nonlinear space; andupdating the vectors in the input space in which the base vectors arecreated, and increasing a relation between the mapping of the learningpattern obtained by the decided projection to the nonlinear space andthe subspace corresponding to the category into which the learningpattern is classified.
 4. The method of learning base vectorsconstituting subspaces used for pattern recognition according to claim3, wherein a distance of a mapping of the learning pattern to thenonlinear space from the subspace is serially calculated by a steepestdescent method to decide the projection, and the vectors are updated soas to decrease a distance between the mapping of the learning patternobtained by the decided projection to the nonlinear space and thesubspace corresponding to the category into which the learning patternis classified.
 5. A method of deforming a kernel function, comprisingthe steps of: setting a kernel function for defining a mapping of apattern in an input space to a nonlinear space which comprises subspaceseach defined for each of categories to which the pattern is classified;calculating a relation between the mapping of a learning pattern in theinput space to the nonlinear space and the subspaces; and deforming thekernel function according to the result of the calculation of therelation.
 6. The method of deforming the kernel function according toclaim 5, wherein the relation is calculated based on variations ofprojection components of the mapping of the learning pattern in theinput space to the nonlinear space to base vectors of the subspaces, andthe kernel function is deformed based on the calculation result and apredetermined threshold value.
 7. The method of deforming the kernelfunction according to claim 5, wherein: the base vectors are obtained bymapping a set of vectors conformed to at least one learning sample in aninput space respectively to a nonlinear space by the kernel function; arelation between the mapping of the learning pattern to the nonlinearspace and the subspaces is obtained as variations of projectioncomponents of the mapping to the base vectors of the subspaces; and thekernel function is deformed, for the base vectors large in thevariations, to increase a scale of the kernel function in the vicinityof the input space in which the base vectors are created, and for thebase vectors small in the variations, to decrease the scale of thekernel function in the vicinity of the input space in which the basevectors are created.
 8. The method of deforming the kernel functionaccording to claim 6, wherein a scale conversion is subjected to thefunction form of a pre-learning kernel function, based on the variationsof projection components of the base vectors, to deform the kernelfunction.
 9. The method of learning base vectors constituting subspacesused for pattern recognition according to claim 3, comprising the stepsof: obtaining a relation between the mapping of an input pattern to thenonlinear space and a subspace of each of the categories; and ifrelations between the mapping of the input pattern to the nonlinearspace and all the subspaces are lower than a predetermined value,presenting a teaching signal indicative of a category to which the inputpattern belongs and learning base vectors of a subspace corresponding tothe category.
 10. The method of learning base vectors constitutingsubspaces used for pattern recognition according to claim 3, comprisingthe steps of: obtaining projection of the mapping of an input pattern tothe nonlinear space to a subspace of each of the categories by makingserial computations by the steepest descent method on a distance betweenthe mapping of the input pattern to the nonlinear space and thesubspaces; and if distances between the mapping of the input pattern tothe nonlinear space and all the subspaces are larger than a desiredvalue, presenting a teaching signal indicative of a category to whichthe input pattern belongs, and learning base vectors of a subspacecorresponding to the category.
 11. A storage medium readable by acomputer, the storage medium storing a program of instructionsexecutable by the computer to perform a function for patternrecognition, the function comprising the steps of: calculating anevaluation value by using a set of vectors obtained by mapping a set ofvectors conformed to at least one learning sample in an input spacerespectively to a nonlinear space defined by a kernel function as a setof base vectors constituting each of plural subspaces in the nonlinearspace, defined for each of categories into which a pattern isclassified, the evaluation value representing a relation between theplural subspaces represented by a linear combination of correspondingsets of the base vectors and mapping of an unknown input pattern to thenonlinear space; and recognizing a category to which the unknown inputpattern belongs, based on the evaluation value.
 12. A patternrecognition apparatus, comprising: an input part that inputs patterns;an evaluation value calculating part that uses a set of vectors obtainedby mapping a set of vectors conformed to at least one learning sample inan input space respectively to a nonlinear space defined by a kernelfunction as a set of base vectors constituting a subspace in thenonlinear space, defined for each of categories into which a pattern isclassified, to calculate an evaluation value representative of arelation between the plural subspaces represented by a linearcombination of corresponding sets of the base vectors, and mapping of anunknown input pattern to the nonlinear space; and a category recognitionpart that recognizes a category to which the unknown input patternbelongs, based on the evaluation value.
 13. The pattern recognitionapparatus according to claim 12, comprising: a dedicated processingdevice that performs parallel processing for each of the categories. 14.The pattern recognition apparatus according to claim 12, comprising: abase vector updating part that decides projection of mapping of alearning pattern inputted from the input part to the nonlinear space toa subspace corresponding to a category into which the learning patternis classified, and updates the base vectors to increase a relationbetween the mapping of the learning pattern obtained by the decidedprojection to the nonlinear space and the subspace corresponding to thecategory into which the learning pattern is classified.
 15. The patternrecognition apparatus according to claim 12, comprising: a vectorupdating part that decides projection of mapping of a learning patterninputted from the input part to the nonlinear space to a subspacecorresponding to a category into which the learning pattern isclassified, by serial computations by a steepest descent method on adistance between the mapping of the learning pattern to the nonlinearspace and the subspaces, and updates, by the steepest descent method,the vectors of the input space in which the base vectors are created, soas to decrease a distance between the mapping of the learning pattern tothe nonlinear space obtained by the decided projection and the subspacecorresponding to the category into which the learning pattern isclassified.
 16. The pattern recognition apparatus according to claim 12,comprising: a kernel function deforming part that calculates a relationbetween mapping of at least one learning pattern in the input space tothe nonlinear space and the subspaces by using the kernel function, anddeforms the kernel function according to the result of calculation ofthe relation.
 17. The pattern recognition apparatus according to claim12, comprising: a kernel function deforming part that calculatesvariations of projection components of mapping of at least one learningpattern in the input space to the nonlinear space to base vectors of thesubspaces by the kernel function, and deforms the kernel function sothat the calculation of the variations results in a predetermined value.